-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26680][SQL] Eagerly create inputVars while conditions are appropriate #23617
Conversation
Fix scope of match Add test
@@ -159,6 +156,10 @@ trait CodegenSupport extends SparkPlan { | |||
BoundReference(i, attr.dataType, attr.nullable).genCode(ctx) | |||
} | |||
} | |||
} match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this construction is confusing. Can this be split into two statements rather than chain a block and match?
Test build #101544 has finished for PR 23617 at commit
|
aside from sean's comments, lgtm. Also cc @hvanhovell since you reviewed #22789 for SPARK-25767 |
Test build #101553 has finished for PR 23617 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…opriate ## What changes were proposed in this pull request? When a user passes a Stream to groupBy, ```CodegenSupport.consume``` ends up lazily generating ```inputVars``` from a Stream, since the field ```output``` will be a Stream. At the time ```output.zipWithIndex.map``` is called, conditions are correct. However, by the time the map operation actually executes, conditions are no longer appropriate. The closure used by the map operation ends up using a reference to the partially created ```inputVars```. As a result, a StackOverflowError occurs. This PR ensures that ```inputVars``` is eagerly created while conditions are appropriate. It seems this was also an issue with the code path for creating ```inputVars``` from ```outputVars``` (SPARK-25767). I simply extended the solution for that code path to encompass both code paths. ## How was this patch tested? SQL unit tests new test python tests Closes #23617 from bersprockets/SPARK-26680_opt1. Authored-by: Bruce Robbins <[email protected]> Signed-off-by: Herman van Hovell <[email protected]> (cherry picked from commit d4a30fa) Signed-off-by: Herman van Hovell <[email protected]>
Merged to master/2.4. If we need this in 2.3 as well, then we need to make a backport that fixes the conflicts. |
…opriate When a user passes a Stream to groupBy, ```CodegenSupport.consume``` ends up lazily generating ```inputVars``` from a Stream, since the field ```output``` will be a Stream. At the time ```output.zipWithIndex.map``` is called, conditions are correct. However, by the time the map operation actually executes, conditions are no longer appropriate. The closure used by the map operation ends up using a reference to the partially created ```inputVars```. As a result, a StackOverflowError occurs. This PR ensures that ```inputVars``` is eagerly created while conditions are appropriate. It seems this was also an issue with the code path for creating ```inputVars``` from ```outputVars``` (SPARK-25767). I simply extended the solution for that code path to encompass both code paths. SQL unit tests new test python tests Closes apache#23617 from bersprockets/SPARK-26680_opt1. Authored-by: Bruce Robbins <[email protected]> Signed-off-by: Herman van Hovell <[email protected]>
@hvanhovell @srowen @squito Thanks! I will make a PR for branch-2.3 later today. |
…s while conditions are appropriate ## What changes were proposed in this pull request? Back port of #22789 and #23617 to branch-2.3 When a user passes a Stream to groupBy, ```CodegenSupport.consume``` ends up lazily generating ```inputVars``` from a Stream, since the field ```output``` will be a Stream. At the time ```output.zipWithIndex.map``` is called, conditions are correct. However, by the time the map operation actually executes, conditions are no longer appropriate. The closure used by the map operation ends up using a reference to the partially created ```inputVars```. As a result, a StackOverflowError occurs. This PR ensures that ```inputVars``` is eagerly created while conditions are appropriate. It seems this was also an issue with the code path for creating ```inputVars``` from ```outputVars``` (SPARK-25767). I simply extended the solution for that code path to encompass both code paths. ## How was this patch tested? SQL unit tests new test python tests Closes #23642 from bersprockets/SPARK-26680_branch23. Authored-by: Bruce Robbins <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…opriate ## What changes were proposed in this pull request? When a user passes a Stream to groupBy, ```CodegenSupport.consume``` ends up lazily generating ```inputVars``` from a Stream, since the field ```output``` will be a Stream. At the time ```output.zipWithIndex.map``` is called, conditions are correct. However, by the time the map operation actually executes, conditions are no longer appropriate. The closure used by the map operation ends up using a reference to the partially created ```inputVars```. As a result, a StackOverflowError occurs. This PR ensures that ```inputVars``` is eagerly created while conditions are appropriate. It seems this was also an issue with the code path for creating ```inputVars``` from ```outputVars``` (SPARK-25767). I simply extended the solution for that code path to encompass both code paths. ## How was this patch tested? SQL unit tests new test python tests Closes apache#23617 from bersprockets/SPARK-26680_opt1. Authored-by: Bruce Robbins <[email protected]> Signed-off-by: Herman van Hovell <[email protected]>
…opriate ## What changes were proposed in this pull request? When a user passes a Stream to groupBy, ```CodegenSupport.consume``` ends up lazily generating ```inputVars``` from a Stream, since the field ```output``` will be a Stream. At the time ```output.zipWithIndex.map``` is called, conditions are correct. However, by the time the map operation actually executes, conditions are no longer appropriate. The closure used by the map operation ends up using a reference to the partially created ```inputVars```. As a result, a StackOverflowError occurs. This PR ensures that ```inputVars``` is eagerly created while conditions are appropriate. It seems this was also an issue with the code path for creating ```inputVars``` from ```outputVars``` (SPARK-25767). I simply extended the solution for that code path to encompass both code paths. ## How was this patch tested? SQL unit tests new test python tests Closes apache#23617 from bersprockets/SPARK-26680_opt1. Authored-by: Bruce Robbins <[email protected]> Signed-off-by: Herman van Hovell <[email protected]> (cherry picked from commit d4a30fa) Signed-off-by: Herman van Hovell <[email protected]>
…opriate ## What changes were proposed in this pull request? When a user passes a Stream to groupBy, ```CodegenSupport.consume``` ends up lazily generating ```inputVars``` from a Stream, since the field ```output``` will be a Stream. At the time ```output.zipWithIndex.map``` is called, conditions are correct. However, by the time the map operation actually executes, conditions are no longer appropriate. The closure used by the map operation ends up using a reference to the partially created ```inputVars```. As a result, a StackOverflowError occurs. This PR ensures that ```inputVars``` is eagerly created while conditions are appropriate. It seems this was also an issue with the code path for creating ```inputVars``` from ```outputVars``` (SPARK-25767). I simply extended the solution for that code path to encompass both code paths. ## How was this patch tested? SQL unit tests new test python tests Closes apache#23617 from bersprockets/SPARK-26680_opt1. Authored-by: Bruce Robbins <[email protected]> Signed-off-by: Herman van Hovell <[email protected]> (cherry picked from commit d4a30fa) Signed-off-by: Herman van Hovell <[email protected]>
What changes were proposed in this pull request?
When a user passes a Stream to groupBy,
CodegenSupport.consume
ends up lazily generatinginputVars
from a Stream, since the fieldoutput
will be a Stream. At the timeoutput.zipWithIndex.map
is called, conditions are correct. However, by the time the map operation actually executes, conditions are no longer appropriate. The closure used by the map operation ends up using a reference to the partially createdinputVars
. As a result, a StackOverflowError occurs.This PR ensures that
inputVars
is eagerly created while conditions are appropriate. It seems this was also an issue with the code path for creatinginputVars
fromoutputVars
(SPARK-25767). I simply extended the solution for that code path to encompass both code paths.How was this patch tested?
SQL unit tests
new test
python tests